Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Shotgun Metagenomic Data Analysis ◾ 321

We need to index the sorted BAM files using “samtools index” command.

for i in $(ls *.sorted);

samtools index -@ 4 ${i}

done

Then, we will use “samtools idxstats” to generate some statistics from the sorted BAM files.

samtools idxstats ERR1823587_healthy.bam.sorted > ERR1823587_

healthy_stat.txt

samtools idxstats ERR1823601_moderate.bam.sorted > ERR1823601_

moderate_stat.txt

samtools idxstats ERR1823608_severe.bam.sorted > ERR1823608_

severe_stat.txt

The output of the “samtools idxstats” command is a TAB-delimited file with each line con-

sisting of the reference sequence name, sequence length, number of mapped read-segments,

and number of unmapped read-segments. From those files, we can generate abundance

table similar to the OTU (operation taxonomic units) generated from clustering of the

amplicon-based reads in Chapter 7. For this purpose, we can use “get_count_table.py”

script, which can be cloned from GitHub using the following command:

git clone https://github.com/metajinomics/mapping_tools.git

Then, we can use that Python 2 script to generate an abundance table for each sample. So,

if you do not have Python 2 installed on your computer, you may need to install it.

python2 mapping_tools/get_count_table.py ERR1823587_healthy_stat.

txt > ERR1823587_healthy_count.txt

python2 mapping_tools/get_count_table.py ERR1823601_moderate_stat.

txt > ERR1823601_moderate_count.txt

python2 mapping_tools/get_count_table.py ERR1823608_severe_stat.

txt > ERR1823608_severe_count.txt

cd ..

We will use the output of this script for binning in the next step.

8.2.7 Binning

Above, we discussed binning as the process of separating the sequences into bins that

represent the most likely taxa. There are many programs that can do this job including

metabat2, CONCOCT, and MaxBin. Here, we will use metabat2 as an example. Metabat2

is easier to install on Anaconda or Miniconda.

conda install -c biconda metablat2

conda install -c bioconda/label/cf201901 metabat2